Skip to content

feat(pt_expt): dpa1(attn_layer=0) graph-native NeighborGraph forward#5583

Merged
wanghan-iapcm merged 69 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-dpmodel-graph-dpa1
Jun 29, 2026
Merged

feat(pt_expt): dpa1(attn_layer=0) graph-native NeighborGraph forward#5583
wanghan-iapcm merged 69 commits into
deepmodeling:masterfrom
wanghan-iapcm:feat-dpmodel-graph-dpa1

Conversation

@wanghan-iapcm

@wanghan-iapcm wanghan-iapcm commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the graph-native forward path for dpa1(attn_layer=0) (the factorizable, mixed-types case), built on the NeighborGraph foundation from #5581. Geometry enters the descriptor only through per-edge edge_vec; the neighbor-axis reduction becomes a segment_sum over edge centers. For pt_expt this becomes the default forward (force/virial via a single autograd backward through edge_vec).

What it adds

  • dpmodel: edge_env_mat (per-edge env-mat 4-vector), DescrptBlockSeAtten._call_graph + DescrptDPA1.call_graph, model call_lower_graph (energy), neighbor_graph_from_ijs + an optional ASE O(N) carry-all builder.
  • pt_expt: edge_energy_deriv (autograd grad(E, edge_vec)edge_force_virial) + forward_common_lower_graph (energy + force + virial + atom_virial).
  • The dense DescrptDPA1.call becomes a thin adapter (from_dense_quartet → call_graph) preserving the 5-tuple ABI; a shape-static converter keeps it jax.jit / torch.export-traceable.

Default behavior

  • pt_expt defaults graph-eligible dpa1(attn_layer=0, concat tebd, no exclude_types) models to the carry-all graph (it has the autograd force/virial path).
  • dpmodel/jax keep the dense default (they compute force/virial analytically; the graph lower is energy-only), and agree with pt_expt at non-binding sel.
  • Ineligible configs (attention, strip tebd, exclude_types, linear/ZBL) fall back to the dense path unchanged. neighbor_graph_method="legacy" forces dense; "dense"/"ase" force the graph.

Parity (graph vs legacy dense lower, fp64 CPU)

energy force virial atom_virial
max abs diff 0 ~1e-19 ~1e-18 ~1e-18

atom_virial matches the canonical TF==pt-legacy full-to-src convention. dpa1 descriptor + model consistency suites green across dp/jax/pt_expt.

Known limitations

  • Default-flip is pt_expt-only; full carry-all default for dp/jax needs analytical/jax graph force (follow-up).
  • make_fx (forward + grad) traces; full .pt2 AOTI export is a follow-up (PR-B). The carry-all builders (build_neighbor_graph/from_ijs) still use nonzero (eager-only); their static variants land with the export PR.
  • Single-rank only; CUDA unvalidated (CPU box); ASE is opt-in O(N) (vesin O(N) is a follow-up); no jax graph force / dpa2-3 message-passing yet.

Also folds in three follow-up fixes to the #5581 foundation from @OutisLi's review (dangling spec refs → design discussion, edge_force_virial jax int-sum short-circuit, Array typing).

Summary by CodeRabbit

  • New Features
    • Added graph-native “lowering” for DPA1 when compatible, including graph-native descriptor/forward execution and graph-native descriptor→model output conversion.
    • Introduced opt-in neighbor_graph_method routing for energy/force/virial, with carry-all neighbor graphs and graph-output fitting/post-processing.
    • Added new neighbor-graph utilities (including ASE-based carry-all building, (i,j,S) conversion, and per-edge environment-matrix computation), exported as part of the public API.
  • Bug Fixes
    • Improved stability for masked/padded edges, virtual atom handling, and parameter protection consistency; refined traced virial assembly when node-capacity is used.
  • Tests
    • Expanded parity/regression suites for graph lowering, energy/force/virial, conversion correctness, ragged graphs, and FX tracing.

Han Wang added 26 commits June 25, 2026 17:26
…_graph

The dense path masks excluded type pairs; the graph path does not yet, so
raise NotImplementedError instead of silently diverging.
serialize roundtrip + dpmodel->pt_expt interop on the attn_layer=0 graph path
are already covered by test_dpa1.py::test_consistency (lines 86-113), which
routes through the graph forward via the Task-3 dense-call adapter.
…all back to dense

Task 3's adapter routed ALL attn_layer==0 through the graph, but the graph
only supports tebd_input_mode='concat', no exclude_types, and needs mapping
for ghosts. strip-mode / exclude / mapping-None-with-ghosts attn_layer=0
models raised/IndexError'd. uses_graph_lower() now encodes full eligibility
and ineligible configs fall back to the legacy dense body unchanged.
Fixes test_compressed_forward (attn_layer=0 strip).
…pt graph mask key; legacy opt-out in Option-B test

- _resolve_graph_method/_call_common_graph use getattr(atomic_model,'descriptor',None)
  so Linear/ZBL models (no descriptor) fall back to dense instead of AttributeError
- pt_expt _call_common_graph override adds the all-ones mask key for dense parity
- test_dpa1_graph_model_energy dense refs use neighbor_graph_method='legacy'
  to opt out of the now-default carry-all graph (decision deepmodeling#17 default-flip)
…nse default

dpmodel/jax compute force/virial analytically inside call_common (energy_derv_r);
the energy-only graph lower drops it -> KeyError when force is requested. Only
pt_expt has the autograd graph force/virial path, so only pt_expt defaults
eligible models to the graph. dpmodel base _resolve_graph_method no longer
auto-routes; pt_expt overrides it to re-enable AUTO.
…x int-sum, Array typing)

- swap dangling memory/spec_unified_edge_nlist.md refs -> public design
  discussion (#4) so the references resolve
- edge_force_virial: short-circuit n_out=int(node_capacity) when supplied so
  the static jax/export path never calls int() on a traced sum(n_node)
- derivatives.py: move Array import under TYPE_CHECKING (+ from __future__
  import annotations) for subpackage uniformity
@wanghan-iapcm wanghan-iapcm requested a review from iProzd June 25, 2026 09:43
Comment thread source/tests/pt_expt/model/test_dpa1_graph_lower.py Fixed
@wanghan-iapcm wanghan-iapcm marked this pull request as ready for review June 27, 2026 01:05
@dosubot dosubot Bot added the enhancement label Jun 27, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/dpmodel/utils/neighbor_graph/graph.py`:
- Around line 122-127: The helper frame_id_from_n_node still converts n_node to
a Python int via int(xp.sum(n_node)), which breaks tracing for symbolic inputs.
Update frame_id_from_n_node to avoid deriving n_total from a runtime sum and
instead accept a static node count/capacity, following the same export-safe
pattern used by node_validity_mask. Keep the rest of the boundary/searchsorted
logic in place, but ensure all shape-related values come from static inputs so
the function remains trace-friendly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ada10980-d2ac-4e52-881e-6ae6632bb748

📥 Commits

Reviewing files that changed from the base of the PR and between 58ef3fb and e8e9885.

📒 Files selected for processing (23)
  • deepmd/dpmodel/atomic_model/base_atomic_model.py
  • deepmd/dpmodel/atomic_model/dp_atomic_model.py
  • deepmd/dpmodel/atomic_model/polar_atomic_model.py
  • deepmd/dpmodel/descriptor/dpa1.py
  • deepmd/dpmodel/fitting/general_fitting.py
  • deepmd/dpmodel/model/make_model.py
  • deepmd/dpmodel/model/transform_output.py
  • deepmd/dpmodel/utils/exclude_mask.py
  • deepmd/dpmodel/utils/neighbor_graph/__init__.py
  • deepmd/dpmodel/utils/neighbor_graph/graph.py
  • deepmd/pt_expt/model/edge_transform_output.py
  • deepmd/pt_expt/model/make_model.py
  • source/tests/common/dpmodel/case_single_frame_with_nlist.py
  • source/tests/common/dpmodel/test_dpa1_call_graph_descriptor.py
  • source/tests/common/dpmodel/test_edge_env_mat.py
  • source/tests/common/dpmodel/test_fitting_call_graph.py
  • source/tests/common/dpmodel/test_graph_atomic_parity.py
  • source/tests/common/dpmodel/test_graph_ragged.py
  • source/tests/common/test_mixins.py
  • source/tests/pd/model/test_env_mat.py
  • source/tests/pt_expt/model/test_dpa1_graph_lower.py
  • source/tests/pt_expt/model/test_graph_ragged.py
  • source/tests/universal/common/cases/cases.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • deepmd/dpmodel/utils/neighbor_graph/init.py
  • deepmd/pt_expt/model/edge_transform_output.py
  • deepmd/dpmodel/descriptor/dpa1.py

Comment thread deepmd/dpmodel/utils/neighbor_graph/graph.py Outdated
Han Wang added 6 commits June 28, 2026 01:47
…ude non-vacuity; unify polar eye tiling

- forward_atomic_graph fparam-by-frame_id dispatch now UTed (graph==dense
  1e-12 + per-frame fparam differs) [review #2]
- pair-exclude non-vacuity toggles pair_excl on the SAME model weights
  (isolates exclusion from weights) [review #1]
- polar apply_out_stat eye tiling unified to xp.tile(eye, (*atype.shape,1,1))
  (drops the ndim==2 if/else) [review #3]
…; move dpmodel transform to edge_transform_output.py [review deepmodeling#8,deepmodeling#10]

- fit_output_to_model_output_graph now takes the NeighborGraph instead of
  n_node (dpmodel) / edge_vec+edge_index+edge_mask+n_node (pt_expt); the
  pt_expt autograd leaf is graph.edge_vec. Unifies the two signatures.
- dpmodel fit_output_to_model_output_graph moved transform_output.py ->
  new edge_transform_output.py (mirrors the pt_expt file layout).
- tighten pair-exclude non-vacuity tolerance (1e-9; the (0,1) effect is ~2e-6).
…per + alias) [review deepmodeling#6,deepmodeling#7,deepmodeling#9]

Mirror the dense lower structure for the graph path:
- NEW model-level forward_common_atomic_graph (builds NeighborGraph + atomic
  forward_common_atomic_graph + flat-N output transform) -- analogue of the
  dense forward_common_atomic; the graph build is no longer inlined in the
  lower [deepmodeling#6].
- call_lower_graph -> public call_common_lower_graph WITH _input/_output_type_cast
  (edge_vec is the geometry in place of coord), making it a directly-callable
  PRIMARY interface per spec decision deepmodeling#14 [deepmodeling#7].
- call_lower_graph = call_common_lower_graph alias (mirrors call_lower =
  call_common_lower) [deepmodeling#9].
The TestCompiledVaryingNatoms dpa1(attn_layer=0) case failed: the uncompiled
reference uses the pt_expt carry-all GRAPH forward (default-flip deepmodeling#17) while the
compiled forward_lower uses the sel-capped DENSE forward. Those are two
different force computations -- even at non-binding sel the forward matches to
~1e-16 but their backward gradients agree only to fp64 accumulation (~1e-12),
which the optimizer amplifies into a diverging training trajectory (weight
drift ~1e-3 after one step). It is NOT sel-binding and NOT a torch.compile
dynamic-shape bug.

Pin BOTH sides to the legacy dense env-mat path via force_legacy_descriptor=True
(monkeypatch descriptor.uses_graph_lower -> False, killing both the default-flip
and the _call_graph_adapter), so this stays a true compile-correctness check on
the path it actually compiles. Compiling the GRAPH lower so eager==compiled is
tracked for PR-B.
…tion

Add the missing Parameters/Returns sections (and fill incomplete ones) on the
NeighborGraph / graph-lower functions so they match the package numpydoc style:

- dpa1: _call_graph_adapter, _call_dense (Parameters+Returns)
- general_fitting.call_graph: add missing g2, h2 params
- neighbor_graph: pad_and_guard_edges, node_validity_mask (Parameters+Returns);
  from_dense_quartet, build_neighbor_graph_ase (Returns); edge_force_virial
  (add g_e/edge_vec/edge_index/edge_mask params)
- dpmodel/pt_expt make_model: _resolve_graph_method, _call_common_graph
  (Parameters+Returns); call_common_lower_graph (replace "Parameters mirror ..."
  cross-ref with an explicit Parameters section)
- pt_expt edge_transform_output: edge_energy_deriv (Parameters+Returns);
  fit_output_to_model_output_graph (Returns)

Docstring-only; no behavior change.
Comment thread deepmd/dpmodel/model/make_model.py
Comment thread deepmd/dpmodel/model/make_model.py
Comment thread deepmd/dpmodel/utils/neighbor_graph/builder.py
Comment thread deepmd/dpmodel/model/make_model.py
Comment thread deepmd/dpmodel/utils/neighbor_graph/graph.py Outdated
Comment thread deepmd/dpmodel/atomic_model/dp_atomic_model.py
Han Wang and others added 3 commits June 28, 2026 15:58
- call_common: an explicit `neighbor_list` (a dense-nlist strategy) is no longer
  silently ignored by the graph default. Raise on `neighbor_list` + explicit
  `neighbor_graph_method`; otherwise honor the nlist by taking the dense route.
- frame_id_from_n_node: accept an optional static `n_total` (jax/export
  trace-friendly, avoids `int(sum(n_node))`); clamp padding nodes to the last
  frame so a padded node axis stays in range for segment_sum.
- thread `charge_spin` (accept-for-ABI-stability, like comm_dict/n_local)
  through the graph interface: forward_atomic_graph, forward_common_atomic_graph,
  call_common_lower_graph, forward_common_lower_graph.
- docs: list neighbor_graph_method options one per line incl. "legacy", clarify
  "dense"/"ase" are carry-all GRAPH builders (not the dense nlist lower);
  contrast from_dense_quartet (legacy-quartet adapter, keeps sel truncation) vs
  the carry-all builders.

Tests: neighbor_list conflict-raise + dense-route fallback; frame_id static
n_total (exact + padded).
dpa1 does not consume charge_spin (get_dim_chg_spin()==0; the dense atomic model
passes None to the descriptor since add_chg_spin_ebd is False). charge_spin is
accepted on the graph lower only for ABI stability with charge/spin-conditioned
descriptors (dpa3/dpa4, PR-G). Pin that the dpa1 graph lower output is INVARIANT
to charge_spin:
- dpmodel call_common_lower_graph: energy/atom_energy/mask unchanged.
- pt_expt forward_common_lower_graph: energy/force/virial/atom_virial unchanged.

With the existing graph==dense parity at non-binding sel this gives the full
claim graph(charge_spin) == graph(None) == dense. Guards against a future
regression where charge_spin leaks into the dpa1 graph path.
@wanghan-iapcm wanghan-iapcm requested a review from iProzd June 28, 2026 08:10
CodeQL flagged the unused local `N = nf * nloc`; fold it into the comment.
@wanghan-iapcm wanghan-iapcm enabled auto-merge June 28, 2026 13:15
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Jun 28, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 28, 2026
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Jun 29, 2026
Merged via the queue into deepmodeling:master with commit 5082854 Jun 29, 2026
70 checks passed
@wanghan-iapcm wanghan-iapcm deleted the feat-dpmodel-graph-dpa1 branch June 29, 2026 06:50
wccc-phys pushed a commit to wccc-phys/deepmd-kit that referenced this pull request Jul 2, 2026
…C++ inference single & multi-rank (NeighborGraph PR-B) (deepmodeling#5604)

## NeighborGraph PR-B — graph `.pt2` export, compiled training, and C++
inference (single & multi-rank)

This PR spans the full PR-B: **B1** (Python: graph `.pt2` export +
compiled training on the graph lower), **B2** (C++ single-rank inference
of the graph `.pt2`, dynamic edge axis), and **B3** (C++/LAMMPS
multi-rank). Built on the merged PR-A (deepmodeling#5583). Scope: dpa1,
`attn_layer=0`, pt_expt.

### B1 — graph `.pt2` export + compiled training (Python)
- `forward_common_lower_graph_exportable` trace target;
`serialization.py` graph export branch (`lower_kind="graph"`,
`lower_input_kind` metadata); `_eval_model_graph` DeepEval dispatch
(parity vs eager dpa1 **1e-10 pbc+nopbc**).
- **Compiled training retargeted to the graph lower so eager ==
compiled** (the MUST-FIX) → `force_legacy_descriptor` deleted. Root
cause was a real dpa1 `call_graph` autograd **detach** bug
(`xp.asarray(tebd, device=)` drops the tebd-net gradient under torch);
fixed.

### B2 — C++ graph ingestion (dynamic edge axis, single-rank)
- Graph `.pt2` uses a **dynamic edge axis** (`Dim("nedge", min=2)`) —
one artifact evals any system size (proven across 56- and 380-edge
systems at 1e-10), no C++ capacity ceiling.
- C++ `DeepPotPTExpt`: `lower_input_is_graph_` + `run_model_graph`
(NeighborGraph ABI: `atype, n_node, edge_index, edge_vec, edge_mask, …`)
+ `buildGraphTensors` (mirrors the deepmodeling#5562 edge path; node types from
`atype_ext`); `remap_graph_outputs_to_dense_keys` (single-rank).
- gtest: 5 cases × {double,float} = 10/10 (build-nlist parity, dynamic-E
2nd size, `ago>0`, tiny system, atomic-overload). The review process
caught two bugs that would otherwise have shipped: an `ago>0` heap-OOB
(by inspection) and a public-vs-internal output-key mismatch (at
runtime).

### B3 — multi-rank C++ / LAMMPS (non-MP)
- **dpa1 is non-message-passing ⇒ multi-rank needs NO
`border_op`/with-comm artifact** (that is a message-passing concern,
deferred to PR-G). Multi-rank reuses the **same single-rank graph
`.pt2`**, fed an **extended-region graph**
(`buildGraphTensors(fold_to_local=false)`, `N=nall`, ghost node types
from `atype_ext` incl. halo), with owned energy =
`sum(atom_energy[0:nloc])` and the extended force folded to owners
through the **existing dense `select_map` reverse-comm**. The fail-fast
for `graph && multi_rank && has_message_passing` is retained.
- **Validated locally on multi-CPU** (no GPU needed for correctness):
`test_lammps_dpa1_graph_pt2.py` — single-rank vs reference, `mpirun -n
2` ≡ single-rank (energy + per-atom force + virial, atol 1e-8), plus an
empty-subdomain (`nloc=0`) corner. Single-rank gtests stay 10/10
(multi-rank is purely additive). Multi-rank matched single-rank on the
first run.

### Tests / known limitations
- Per-task + whole-phase reviews all Ready-to-merge.
- **pt_expt-only; dpa1 (non-MP) only.** Follow-ons: **PR-C** vesin/nv
O(N) builders (carry-all builders still use `nonzero`, eager-only),
**PR-D** attention, **PR-E** angles, **PR-F** jax graph force, **PR-G**
dpa2/3 message-passing (forward halo + with-comm). CUDA multi-rank
unvalidated locally. Carried code-cleanup follow-ups: a ~60-line DRY
duplication in `training.py`; the multi-rank *atomic* output branch has
no direct gtest (covered indirectly by the mpirun per-atom-virial
assertion, since a single-process gtest can't set `nprocs>1`).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary

* **New Features**
* Added support for graph-schema (NeighborGraph) model archives with a
selectable `lower_kind="graph"` export path, including CLI support and
new graph-form inference handling.
  * Added static edge-capacity support during graph construction.

* **Bug Fixes**
  * Improved gradient continuity for type embeddings in graph mode.
* Enhanced trace/export stability by preventing out-of-range graph
indices/frame IDs and making scatter/frame sizing more consistent.

* **Tests**
* Added/extended parity, export metadata, training, and LAMMPS
single-/multi-rank validation for graph-form `.pt2`, plus metadata
checks for `lower_input_kind`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Han Wang <wang_han@iapcm.ac.cn>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants